Systems and Methods for File Loading

ABSTRACT

The application describes systems and methods for loading a data file with a data pad having a pattern that enables identification of the data file during certain file operations. In one aspect, a file loading system comprises a datastore configured to store a plurality of data files where the plurality of data files include a plurality of original data files and at least one loaded data file. The system also includes a removable media storage device capable of interfacing with the datastore. The system further includes a processor arranged to access the plurality of data files in the datastore and convert a first original data file of the plurality of original data files into a first loaded data file. The first loaded data file includes information and a first data pad of added data. The first data pad includes a first pattern of data elements. The processor also monitors operations associated with the plurality of data files to detect a first data transfer operation from the datastore to the removable media storage device. Furthermore, the processor identifies a data file associated with the first data transfer operation as the first loaded data file by detecting the first pattern of data elements within the data file associated with the first data transfer operation.

REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.14/983,786, filed on Dec. 30, 2015, which is a continuation of U.S.patent application Ser. No. 13/314,413, filed on Dec. 8, 2011, whichclaims priority to and the benefit of U.S. Provisional PatentApplication No. 61/421,014, filed on Dec. 8, 2010, and entitled “Systemsand Methods for File Loading.” The entire contents of theabove-referenced applications are incorporated herein by reference.

FIELD OF THE DISCLOSURE

The disclosure relates generally to systems and methods for data filehandling. More particularly, in various aspects, the disclosure relatesto data file loading.

BACKGROUND

Recently, there have been incidents were large amounts of electronicdata and electronic files have been extracted from secure computingsystems and networks by users with removable media devices. For example,in the Wikileaks incident, a U.S. Army specialist was able to downloadthousands of confidential and secret files to CD-ROMS via a computerterminal, which he then allegedly carried outside a secure facility.

In the past, before the proliferation of electronic information systems,it was not possibly for one individual to access and transport such alarge volume of secret documents. The physical size and volume of suchdocuments in printed form prevented an individual from conveniently orpractically transporting such documents without raising awareness ortriggering detection by security personnel.

Existing data security measures typically rely on an array of physicaland electronic security measures to prevent the release of sensitivepersonal, company, and/or government information. Many security measuresare focused on preventing intruders from breaking physical andelectronic security.

Electronic security measures include encryption, authentication,firewalls, passwords, virus detection, Trojan horse detection, and othernetwork security tools. Physical security mechanisms include lockedrooms, fences, secure facilities, background checks, cameras, badges,and personnel searches. Most security mechanisms provide perimetersecurity to prevent unauthorized entry and/or egress from a securefacility or electronic computing system.

Unfortunately, existing security measures have proven inadequate toprevent personnel with access to large volumes of sensitive electronicdata from conveniently downloading such data and/or files to removablemedia which can then be easily concealed for physical transport from asecure facility.

SUMMARY

The application, in various embodiments, addresses the deficiencies ofcurrent information security systems by preventing the convenientdistribution of large volumes of electronic data.

In one aspect, the present disclosure uses an electronic and/or computersystem to add data (e.g., a pad and/or random data) to a data file toincrease the size of the file such that certain file operations (e.g.,download, transfer, copy, attach, and so on) become time consuming,costly, and/or use large amounts of data storage space. The amount ofdata added to a file can be dynamic depending on the storage and/orprocessing capabilities of the computing system and/or network where theinformation resides. For a system with more processing power and/or moredata storage, more data may be loaded into select data files. Asprocessing power and storage expand over time, the amount of data addedto select files can be proportionally increased. Thus, the inventionmakes it substantially more difficult for a person to extract largevolumes of data files without increased data storage and processingpower and, thereby, an increased possibility of detection.

In one aspect, the amount of padding added to a file may correspond tothe data storage and/or processing power of a computer system. Inanother aspect, the amount of padding added to a file may correspond tothe degree of secrecy and/or value of the information within a file. Inyet another aspect, the amount of padding added to a file may correspondto the amount of processing power and/or storage capability associatedwith known removable media storage devices (e.g., memory sticks, USBmemory devices, CD-ROMs, CD-RWs, disks, and the like), and/or the amountof storage capacity of a computer system, network, and/or database(i.e., datastore). In a further aspect, the amount of padding added to afile may depend on a combination of factors, including one or more thefactors discussed above.

In another aspect, a file loading system comprises a datastore forstoring a plurality of data files where each of the plurality of datafiles includes information and a processor arranged to: access theplurality of data files in the datastore, and load a data pad into oneor more of the plurality of data files to increase the size of the oneor more of the plurality of data files. By increasing the size of one ormore data files to particular amounts, the ability to transfer one ormore data files to certain portable media storage devices or via annetwork transfer is inhibited and/or delayed, allowing for more easydetection and/or prevention of unauthorized data transfers.

DRAWINGS

The foregoing and other objects and advantages of the disclosure will beappreciated more fully from the following further description thereof,with reference to the accompanying drawings. The skilled person in theart will understand that the drawings, described below, are forillustration purposes only. The drawings are not intended to limit thescope of the applicant's teaching in any way.

FIG. 1 includes a diagram of a system according to an illustrativeembodiment of the invention;

FIG. 2 includes a functional block diagram of a computer shown in FIG. 1according to an illustrative embodiment of the disclosure;

FIG. 3 includes a diagram of an electronic file loading processaccording to an illustrative embodiment of the disclosure; and

FIG. 4 includes a diagram of an electronic file loading processincluding data interleaving according to an illustrative embodiment ofthe disclosure;

FIG. 5 includes a diagram of an electronic file loading processincluding a pad generator according to an illustrative embodiment of thedisclosure; and

FIG. 6 includes an exemplary data pattern of a file pad according to anillustrative embodiment of the disclosure;

DESCRIPTION

While the applicant's teachings are described in conjunction withvarious embodiments, it is not intended that the applicant's teachingsbe limited to such embodiments. On the contrary, the applicant'steachings encompass various alternatives, modifications, andequivalents, as will be appreciated by those of skill in the art.

The application describes systems and methods for preventing thedistribution of large volumes of electronic data by loading selectedsensitive files with pad data to increase the size of the files suchthat file transfer, distribution, or downloading to removable mediastorage devices is more cumbersome.

FIG. 1 includes a diagram of an information system 100. The informationsystem 100 includes a user 102 having removable media 104. The removablemedia may include any type of removable and/or portable data storagedevice such as, without limitation, a flash drive, memory stick, DVD,CD-ROM, CD-RW, wireless memory device, floppy disk, portable hard disk,tape drive, and solid state memory device. The system 100 includes anetwork 108, a data store 112, computer 106, computer 118, a firewall114, another network 110, and computer 116. The network 110 may be anetwork such as the Internet and/or an Ethernet associated with aperson, company, facility, building, government entity, and the like.The network 108 may be a private network. The network 110 may includethe network 108. The networks 108 and 110 may includetelecommunications, wired, and/or wireless networkcomponents/infrastructure.

In one aspect, the network 108 includes a firewall 114 that providessecure access control to and/or from network 108. Datastore 112 mayinclude a database that stores electronic information and/or data. Thecomputers 118, 106, and 116 may include personal computers and/ornetwork clients associated with one or more users such as user 102.

FIG. 2 includes a functional block diagram of a general purpose computersystem, e.g., a computer, for performing the functions of the computer106, 118, and/or 116 of FIG. 1 according to an illustrative embodimentof the disclosure. The exemplary computer system 200 includes a centralprocessing unit (CPU) 202, a memory 204, and an interconnect bus 206.The CPU 202 may include a single microprocessor or a plurality ofmicroprocessors for configuring computer system 200 as a multi-processorsystem. The memory 204 illustratively includes a main memory and a readonly memory. The computer 200 also includes the mass storage device 208having, for example, various disk drives, tape drives, etc. The mainmemory 204 also includes dynamic random access memory (DRAM) andhigh-speed cache memory. In operation, the main memory 204 stores atleast portions of instructions and data for execution by the CPU 202.

The mass storage 208 may include one or more magnetic disk or tapedrives or optical disk drives or memory sticks, for storing data andinstructions for use by the CPU 202. At least one component of the massstorage system 208, preferably in the form of a disk drive or tapedrive, stores the database used for processing data and/or electronicmedical records of the system 100. The mass storage system 208 may alsoinclude one or more drives for various portable media, such as a floppydisk, a compact disc read only memory (CD-ROM, DVD, CD-RW, andvariants), or an integrated circuit non-volatile memory adapter (i.e.PC-MCIA adapter) to input and output data and code to and from thecomputer system 200.

The computer system 200 may also include one or more input/outputinterfaces for communications, shown by way of example, as interface 210for data communications via the network 212 (or network 114). The datainterface 210 may be a modem, an Ethernet card or any other suitabledata communications device. To provide the functions of a computer 102according to FIG. 1, the data interface 210 may provide a relativelyhigh-speed link to a network 212 (or network 114 of FIG. 1), such as anintranet, internet, or the Internet, either directly or through anotherexternal interface 116. The communication link to the network 212 maybe, for example, optical, wired, or wireless (e.g., via satellite orcellular network). Alternatively, the computer system 200 may include amainframe or other type of host computer system capable of Web-basedcommunications via the network 212. The computer system 200 may includesoftware for operating an network application such as a web serverand/or web client.

The computer system 200 also includes suitable input/output ports, thatmay interface with a portable data storage device, or use theinterconnect bus 206 for interconnection with a local display 216 andkeyboard 214 or the like serving as a local user interface forprogramming and/or data retrieval purposes. The display 216 may includea touch screen capability to enable users to interface with the system200 by touching portions of the surface of the display 216. Serveroperations personnel may interact with the system 200 for controllingand/or programming the system from remote terminal devices via thenetwork 212.

The computer system 200 may run a variety of application programs andstore associated data in a database of mass storage system 208. One ormore such applications may include file loading as described laterherein with respect to FIGS. 3-6.

The components contained in the computer system 200 are those typicallyfound in general purpose computer systems used as servers, workstations,personal computers, network terminals, and the like. In fact, thesecomponents are intended to represent a broad category of such computercomponents that are well known in the art.

As discussed above, the general purpose computer system 200 may includeone or more applications that provide electronic file loading inaccordance with embodiments of the invention. The system 200 may includesoftware and/or hardware that implements a web server application. Theweb server application may include software such as HTML, XML, WML,SGML, PHP (Hypertext Preprocessor), CGI, and like languages.

The foregoing features of the disclosure may be realized as a softwarecomponent operating in the system 200 where the system 200 is Unixworkstation or other type of workstation. Other operation systems may beemployed such as, without limitation, Windows, MAC OS, and LINUX. Insome aspects, the controller 102 software can optionally be implementedas a C language computer program, or a computer program written in anyhigh level language including, without limitation, C++, Fortran, Java,or Visual BASIC. Certain script-based programs may be employed such asXML, WML, PHP, and so on. Additionally, general techniques for highlevel programming are known, and set forth in, for example, Stephen G.Kochan, Programming in C, Hayden Publishing (1983). The system 200 mayuse a DSP for which programming principles well known in the art.

As stated previously, the mass storage 208 may include a database. Thedatabase may be any suitable database system, including the commerciallyavailable Microsoft Access database, and can be a local or distributeddatabase system. The design and development of suitable database systemsare described in McGovern et al., A Guide To Sybase and SQL Server,Addison-Wesley (1993). The database can be supported by any suitablepersistent data memory, such as a hard disk drive, RAID system, tapedrive system, floppy diskette, or any other suitable system. The system200 may include a database that is integrated with the system 200,however, it will be understood by those of ordinary skill in the artthat in other embodiments the database and mass storage 208 can be anexternal element.

In certain embodiments, the system 200 may include an Internet browserprogram and/or be configured operate as a web server. In someembodiments, the client and/or web server may be configured to recognizeand interpret various network protocols that may be used by a client orserver program. Commonly used protocols include Hypertext TransferProtocol (HTTP), File Transfer Protocol (FTP), Telnet, and SecureSockets Layer (SSL), for example. However, new protocols and revisionsof existing protocols may be frequently introduced. Thus, in order tosupport a new or revised protocol, a new revision of the server and/orclient application may be continuously developed and released.

In one embodiment, the system 100 includes a networked-based, e.g.,Internet-based, application that may be configured and run on the system200 and/or any combination of the other components of the system 100.The computer 106, 118, and/or 116 (or system 200) may include a webserver running a Web 2.0 application or the like. Web applicationsrunning on the computer 106, 118, and/or 116 may use server-side dynamiccontent generation mechanisms such, without limitation, Java servlets,CGI, PHP, or ASP. In certain embodiments, mashed content may begenerated by the web browser 144 via, for example, client-side scriptingincluding, without limitation, JavaScript and/or applets.

In certain embodiments, the computer 106, 118, and/or 116 may includeapplications that employ asynchronous JavaScript+XML (Ajax) and liketechnologies that use asynchronous loading and content presentationtechniques. These techniques may include, without limitation, XHTML andCSS for style presentation, document object model (DOM) API exposed by aweb browser, asynchronous data exchange of XML data, and web browserside scripting, e.g., JavaScript. Certain web-based applications andservices may utilize web protocols including, without limitation, theservices-orientated access protocol (SOAP) and representational statetransfer (REST). REST may utilize HTTP with XML.

The computer 106, 118, and/or 116 may also provide enhanced security anddata encryption. Enhanced security may include access control, biometricauthentication, cryptographic authentication, message integritychecking, encryption, digital rights management services, and/or otherlike security services. The security may include protocols such as IPSECand IKE. The encryption may include, without limitation, DES, AES, RSA,and any like public key or private key based schemes.

FIG. 3 includes a diagram of an electronic file loading process 300according to an illustrative embodiment of the disclosure. According tothe process, a file loading function 304 (which may be an application,hardware, or combination thereof), e.g., a file loader, converts anoriginal file 302 into a loaded file 306 that includes the original data308 from the original file 302 along with a pad 310 of added data. Inone aspect, the pad 310 includes a known and/or derivable pattern orsequence of data elements that may be used by a network monitor, devicemonitor, or application to detect the transfer or another operation(e.g., copy, download, move, transfer, etc.) of the loaded file. Forexample, the computer 118, 106, and/or 116 may include a monitor programand/or application that monitors file system operations or otheroperations on computer 106. The monitor program may check files attachedto emails or during a copying/transfer operation for certain datapatterns to determine whether a loaded file is being operated on. In oneaspect, the data pattern of a pad may be unique to a particular file,enabling the monitor to identify the particular file and/or informationbeing operated on.

A data file may be loaded at certain times and/or under certainconditions. For example, the file creator, via an application (e.g., thefile loading function 304), may designate a file as “secret.” The fileloading application 304 may then load the file 302 with data in pad 310.In another aspect, the file loading application 304 automatically loadsa file with pad 310 data based on at least one of the computer systemstorage capacity, typical removable media storage capacity, computersystem file transfer, or data download transfer rates. The level ofsensitivity of information within a file may be included as metadata ina file, and/or the file loading application 304 may have access to alist/database indicating the level of sensitivity of files in alocation, folder, datastore, and/or computer system. The size of the pad310 can vary. For example, the size of the pad 310 may range from 1 bitto Gigabytes, Terabytes, and even higher. The size of the pad 310 may belimited only by the available memory of the host data storage medium,which may include a datastore, computer system, network of datastores,and the like.

The file loading application 304 may operate or be invoked only whencertain operating system actions and/or application actions areinitiated. For example, files within a certain folder of a file systemmay be designated for loading. While a user performs editing and/orother operations on a file within the folder, no loading of the file isinitiated. However, if the user attempts to move, transfer, and/ordownload the file, cut/paste, or attach the file to an email (or otherdata transport application), the file loading application 304 mayautomatically load the file with a data pad 310 to increase the filesize according to configured file sizing/loading rules. Thus, a user ofa file (e.g., using an editor) would not experience delays orinefficiencies that could be caused by an application having to open,close, save (or perform another operation on) a large loaded file.Instead, the file can be loaded only prior to moving, transferring,copying, cutting/pasting portions of, a file. One advantage to thisapproach is that a nefarious user is prevented from using a metadatastripper or file cleaner application (e.g., iScrub) to remove padding(e.g., the data pad 310) from a file prior to transferring the file,while also allowing more efficient operations on a non-padded file by alegitimate user. In another aspect, an application 304 and/or filemanagement application can load files and store the loaded files in adatastore when the files are initially created and/or received, but whenan authorized application (e.g., MS Word) performs on operation on thefile, the application 304 and/or file management application can restorethe file to its original unloaded format prior to the authorizedapplications use. Once the authorized application is finished with itsoperation, the file can then be re-loaded into its loaded format forstorage.

In another configuration, the file loading application 304 and/or a filemanagement application may allow one or more authorized users to restorea loaded file to its original size and format to allow for authorizedtransfer of a file. In this instance, a user may be required to enter apassword, enter a secret key, use a token, use biometric authentication,and/or use a smart card to authenticate themselves to the filemanagement application or computer system.

In certain aspects, the amount of file padding depends on the size ofthe removable media that can interface with a computer system holdingsensitive data files. For example, if the computer system supports a USBflash drive up to a capacity of 32 MB, the file loading application 304may load a highly sensitive file with padding such that it's total filesize is greater than 32 MB. Because a 32 MB USB drive may actually haveonly 30 MB of storage, highly sensitive files may be loaded to greaterthan 30 MB. In such a scenario, the custodian of the highly sensitivefiles may desire to prevent the transfer of a particular file to aremovable storage device. In another scenario, the custodian may beconcerned with the transfer of a large number of files to removablestorage devices. Thus, if the computer system includes 100 sensitivefiles, the file loading application may load each sensitive file to 1 MBsuch that it would require multiple 32 MB USB flash drives to downloadall of the sensitive files.

In a system with thousands of sensitive files, a user would need manyUSB flash drives to download all of the sensitive files, or would needto use one or more USB flash drives many times to download all of thesensitive data files. Thus, the user's activities would be more readilydetectable by other security mechanisms, especially, for example, if anemployee and/or user is observed with many USB flash drives or observedrepeatedly downloading files to one or more USB flash drives. Similarly,if a computer system includes a CD-RW interface, the file loadingapplication 304 may load each sensitive file to a size of about 20 MB(assuming a CD-ROM and/or CD-RW capacity of about 650 MB). Thus, a CDROMor CD-RW can store about 32 files. If there are 100 sensitive files onthe computer system, a user would need at least 4 CDROMs to download all100 sensitive files. If the computer system includes thousands ofsensitive files, a user would need hundred of CD-ROMs or CD-RWs andspend a substantial amount of time downloading all of the files.

The amount of file loading may be adjusted dynamically. For example,over time, processing power, network data rates, and/or data storagecapacity (of a computer system/network, datastore, and/or removablemedia) may increase. The file loading application 304 can account forsuch increases by increasing the amount of file loading. If the storagecapacity, processing power, and/or transfer rate decrease for somereason, the file loading application 304 can decrease file loading ofselect files. If the sensitivity level associated with a file changes,the amount of file loading can be adjusted. It may occur that copies ofthe same file have different loading depending on the configuration ofits host environment (e.g., host processing power, network data rate,peripheral media device download data rate, data storage capacity ofsystem/network and/or removable media storage device).

In certain aspects, the computer system and/or file loading application304 can limit the types of removable media storage devices that canoperate with a computer system and/or data storage device, or block theoperation of certain types of removable media storage. For example, acomputer system may have a CD-ROM/CD-RW and two USB interfaces. The fileloading application 304 and/or an operating system may be configured toallow only USB flash drives up to 32 MB to interface with a computersystem, but inhibit the use of a writable CD-ROM (CD-RW) drive. In oneapproach, the file loading application 304 may detect and remove thedriver and/or configuration for a writable CD-ROM (CD-RW). In anotherfeature, the operating system (e.g., Windows, LINUX, MAC OS) isconfigured to prevent the use of certain removable media storagedevices.

In certain aspects, the computer system and/or file loading application304 can adjust the loading of select sensitive files based on downloador transfer rate of the removable media interface or the data transferrate in a computer network. For example, a common technology fortransferring digital video to a home PC has been the Institute ofElectrical and Electronics Engineers (IEEE) 1394 standard. Also known asFireWire, it has a 400 Mbps data transfer rate.

A new version of USB, USB 2.0, with a 480 Mbps data transfer rate, makesit faster than IEEE 1394 (FireWire), earning the nickname Hi-Speed USB.USB devices are typically operated at either 12 Mbps (for full-speeddevices) or 1.5 Mbps (for devices with lower bandwidth needs). USB 2.0enables more of those devices at once and also adds a new speed, whichcan use the entire 480 Mbps bandwidth that USB 2.0 provides for Hi-Speeddevices. Such high speeds are critical in bandwidth-hungry applicationslike mass storage devices, although not all devices are capable ofrunning at 480 Mbps. For example, a USB 2.0 mouse remains a low speeddevice and is likely running at only 1 Mbps, but a USB 2.0 Hi-SpeedCD-RW can take advantage of the new USB 2.0 high speeds and burn CDsmuch faster.

In certain embodiments, the file loading application 304 may load selectsensitive files based on the transfer rate of certain removable mediastorage devices. For example, if only lower bandwidth USB devices areallowed to interface with a computer (at 1.5 Mbps), each sensitive filemay be loaded to 1.5 MB (Megabytes). Thus, it would take about 8 secondsto download each sensitive file. (note: there are 8 bits per byte.Therefore, a 1.5 MB file has 12 Mbits of data). If there are 10,000sensitive files, it would take about 80,000 seconds, which is about 22.2hours, to download all files. Such an extended transfer of files over 22hours, whether contiguous or segmented over multiple periods, wouldsubstantially increase the exposure of a user to detection by a networkor computer monitor, or physical security personnel that may beobserving the user's physical and/or electronic activity.

In certain embodiments, the file loading application 304 may load selectsensitive files based on the network data rate that a storage deviceand/or host computer system resides on. For example, the network may use10 Mbps Ethernet, 100 Mbit Ethernet (e.g., 100BaseT), or 1000 Mbps(Gigabit) Ethernet. Typically, about 50% of the Ethernet capacity may beusable. For example, with 100 Mbps Ethernet, at about a 50% datatransfer rate is about 4.7 to 6 MB/s (max). For Gigabit Ethernet, thedata transfer rate (assume about 50% capacity) would be about 47 to 60MB/s transfer rate. Thus, the file loading application 304 may load oneor more sensitive files (within a 100BaseT Ethernet network) to about 60MB. Thus, if there are 10,000 sensitive files, it would take about 27.77hours to transfer all 10,000 files via the Ethernet network from thehost computer system to another data storage location.

In certain aspects, the file loading application 304 loads one or moresensitive files with an amount of padding depending on the sensitivityand/or value of the information within a data file. For example, agovernment entity may designate information in four categories such astop secret, secret, confidential, nonpublic. Files designated as topsecret may include highly sensitive information for which secrecy of thedata is a priority. For top secret files, the file loading application304 may load a file such that its size is greater than the storagecapacity of certain removable (and/or writable) media storage devices,or of a size that requires a total transfer time through a network or toa removable media storage device that is greater that a minimal transfertime. For example, the file loading application 304 may be configured toload select sensitive files such that a total data transfer time of thefile is greater than, for example, 1 hour. The minimal data filetransfer time may be greater than about 1 second, 5 seconds, 10 seconds,30 seconds, 1 minute, 10 minutes, 30 minutes, 1 hr, 2, hrs, 5 hrs, 10hrs, 24 hrs, 48 hrs, 72 hrs, 1 week, 2 weeks, 1 month, and 1 year.

As discussed previously, the file loading application 304 may utilize acombination of anyone or more of the above factors to determine theappropriate file loading of a sensitive file.

FIG. 4 includes a diagram of an electronic file loading process 400including data interleaving according to an illustrative embodiment ofthe disclosure. According to the process 400, a file loading function404 (which may be an application, hardware, or combination thereof)converts an original file 402 into a loaded file 406 that includes theoriginal data 408 from the original file 402 along with a pad 410 ofadded data. In one aspect, the pad 410 includes a known and/or derivablepattern or sequence of data elements that may be used by a networkmonitor, device monitor, or application to detect the transfer oroperation (e.g., copy, download, move, etc.) of the loaded file. Variousoperations to create a loaded file 406 can be initiated and/or performedas described for the file 302 with respect to FIG. 3. One additionalfeature disclosed with respect to FIG. 4, is the ability to interleavethe original data 408 with the pad 410 data such that the originalinformation 308 is distributed throughout the loaded file. Interleavinghas the advantage of reducing the ability of certain file stripping orcleaning applications from stripping away the pad 410 to restore theloaded file to the size of the original file 410. In one aspect, thefile loading application 404 uses an interleaving key 414 to determinethe interleaved locations of portions of the original data 408 withinthe loaded file 406. The file loading application 404 may use apseudorandom function where the interleaving key 414 functions as a seedto determine a pseudorandom sequence and/or selection algorithm todetermine where to place portions of the original data 408 within theloaded data file 406. The file loading application 404 may distributedata portions of equal or different sizes of the original data 402throughout the loaded file 406.

FIG. 5 includes a diagram of an electronic file loading process 500including a pad generator 512 according to an illustrative embodiment ofthe disclosure. According to the process 500, a file loading function504 (which may be an application, hardware, or combination thereof)converts an original file 502 into a loaded file 506 that includes theoriginal data 508 from the original file 502 along with a pad 510 ofadded data. In one aspect, the pad 510 includes a known and/or derivablepattern or sequence of data elements that may be used by a networkmonitor, device monitor, or application to detect the transfer oroperation (e.g., copy, download, move, etc.) of the loaded file. Variousoperations to create a loaded file 506 can be initiated and/or performedin the same manners as described with respect to files 302 and 402, andwith respect to FIGS. 3 and 4.

Additionally, FIG. 5 illustrates the use of a key pad generator 512which may be implemented in hardware and/or software. The pad generator512 may include a table of known pad 510 patterns, a random numbergenerator, and/or pseudorandom number generator. The pad generator 512may use a key 516 (e.g., number, alphanumeric value, and the like) thatdetermines the pattern of elements of the pad 510. In one aspect, thepattern of the pad 510 is unique to a select sensitive file to enablethe file to be uniquely identified even if the file name is changed. Inanother aspect, the pattern of the pad 510 may be unique to a set ofsensitive files. The set may be based on degree of sensitivity (topsecret, secret, etc. . . . ), the location of the sensitive file(particular facility, ship, office, department, entity, individual, role(President), etc.). As one option, the key 516 is stored in the loadedfile 506 so that a monitor application checking the file is able toconfirm the pattern. Including the key 516 in a loaded file canadvantageously enable a monitor application to check the pad patternand/or confirm the file's identity. While a nefarious user may alsoaccess the key, it will still take time and resources to access the key516, exposing the user to detection. Such activity could be trackedand/or logged to enable detection by a monitor application. Theinterleaving key 514 may also be stored in the file 506.

The key 516 may alternatively or additionally be stored at anothernetwork location and/or with a network monitor application so that onlythe monitor application (e.g., file loading application 304) is able toconfirm the proper pad 510 pattern. Where the original data 508 and pad510 are interleaved, the original data 508 may effectively be hiddenwith a large amount of pad 510 data (i.e., a needle in a hay stack). Inone aspect, the key 516 or another interleaving key 514 may be used todetermine the interleaving locations within the loaded file 406 and/or506. Thus, only an application with access to the interleaving key 514will be able to strip away the pad 510 to recover the original data 508.The monitor may also check the size of the loaded file and/or pad 510 toconfirm that the size is the same as set, pre-configured, and/ordetermined file loading size.

The file loading application 304 may load a data pad 310, 410, 510, orportion thereof, as metadata in a data file. Depending on the fileformat (see File Format section herein), the file loading application304 can load the data pad 510 in one or more file locations delimited asmetadata or other data that an application (e.g., MS Word) whichnormally uses the file would not display to a user. For example, a MSWord document may be loaded with a pad 310, 410, 510 to increase thefile size to 10 MB. Yet, when a user opens the file using MS Word, onlythe original information is displayed to the user during normal viewingvia the editor display. In one approach, padding may be loaded as aproperty, custom property, author information, and tracked changes.Alternatively, the data pad 310, 410, 510, or a portion thereof, can beembedded in the information portion of a data file. In oneconfiguration, the data pad 310, 410, 510 is appended to and displayedat the end of the information document. At the end of the displayedoriginal (prior to file loading) document, a delimiting element and/orphrase may be included (e.g., “****Pad Information follows****”),followed by a pattern/sequence of elements of the pad 310, 410, 510.This approach has a disadvantage of including the pad 310, 410, or 510in the displayed document, but has the advantage of including the pad inthe delimited information portion of a data file, which can inhibit datastripping of the pad 310, 410, 510.

In one aspect, a file management application (e.g., filesite) may haveaccess to the pattern key 512 and/or the interleaving key 514. The fileloading application 304 may be operated by a file managementapplication.

In certain aspects, the file loading application 304 can change theformat of a selected sensitive file. The change in format may includeinterleaving the original data 408 with a pad 410. The file loadingapplication 304 may change the file extension of a formatted data fileto enable an application to recognize that the file has a file loadingformat. An application (e.g., MS Word, Adobe Acrobat, and the like) mayinclude functionality to restore the loaded file, or a portion thereof,to its original format. The file loading application 304 and/or a filemanagement application may restore the file to its original format,i.e., the format prior to file loading and/or interleaving.

FIG. 6 includes an exemplary data pattern 602 of a file pad 600according to an illustrative embodiment of the disclosure. While pattern602 is illustrative, a pattern may be in the form of a binary sequenceof 0s and 1s, as a hexadecimal sequence, as a alpha-numeric sequence, orany type of representation of data within an electronic file.

The data file and/or pad for a loaded file may be encrypted to increasedata security, not only to protect disclosure of the information in afile, but also to inhibit a data stripping or cleaning application fromconveniently stripping the pad data from a loaded file. Because certainoperating systems, databases, and computer systems compress files forstorage, the amount of loading and/or the size of a pad 310, 410, 510can be configured and/or adjusted such that the compressed file sizemeets the system size requirement and/or threshold. Thus, if the sizelimit is 1 MB, a file may be loaded to a size of LOMB because itscompressed size will be about 1 MB.

Certain document processing applications may set limits on the size offiles that they will operate with. For example, MS Word 7 will onlyhandle documents having a maximum size of 32 MB. In configurations wherefile loading requires loading a Word-based document to increase its sizeabove 32 MB, the file loading application 304 or another application(e.g., MS Word) may split the document into multiple subdocuments. Ifthe required file size must be 50 MB, the application may split anoriginal file into two subdocument files. The application can thenmanage the subdocument files using a master document. Althoughcumbersome enough to deter possible constant editing and referencingwithin the documents, namely, such an approach enables file loading anduse of such large files. Other more stable tools may employed with Word,including INCLUDETEXT and RD fields. In one configuration, original dataor information may be interleaved among pad data throughout multiplesubdocuments. Certain other file formats may limit a file's size and,hence, the amount of file loading. For example, a file format based onFAT32 will have a file size limit of 4 GB. Adobe Acrobat files have a100 MB limit. Photoshop documents have a 2 GB size limit.

Again, the file loading application 304, another application, and/or afile management application (e.g., Sharepoint) may implement any one ofvarious mechanisms to manage files having sizes larger than typicalsystem limits. As discussed above, the application may create multiplesubfiles and/or subdocuments of a loaded file such that the subfilesmeet the file size limitations of an application that uses the file(e.g., MS Word), an operating system (Windows), and/or a file managementsystem (e.g., Filesite, Sharepoint, etc. . . . ). However, in certainconfigurations and based on certain security requirements, the set fileloading size (or size of loaded files) may not exceed standardapplication file size limits. Thus, file loading of selected sensitivefiles may not require additional file processing procedures to managefile size with respect to an application that normal uses a particulartype of file.

In another feature, the file loading application or another applicationmay convert files in one format to another format that supports largerdata file size limits. For example, the application 304 may convert a MSWord document (having a 32 MB file size limit) to an Adobe Acrobatdocument (having a 100 MB file size limit) so that the information canbe stored in a loaded file having a size of about 50 MB. In this way, itwould not be necessary to create and manage two Word subdocuments havinga total size 50 MB. In another aspect, the application 304 may convert adocument in one format have a file size limit to a second format havingno file size limit or a higher file size limit.

The foregoing exemplary systems may be applied to various types ofcommercial, government, and/or personal information. For example, theU.S. Social Security Administration may wish to prevent convenientdownloading of sensitive files including social security numbers andother personal information. They may choose to load the files to makethe transfer of such files more cumbersome and, therefore, lesssusceptible to nefarious transfer. A bank or insurance company may wishto protect certain sensitive data files related to the business financesand/or finances of customers. A state healthcare administrator,hospital, healthcare provider, or Doctor may wish load medical recordsof patients and/or clients to make it harder for someone to transferpatient data files from a private/protected datastore. A governmentagency, such as the U.K. Ministry of Defense (MoD) or U.S. Department ofDefense (DoD) may wish to load classified documents. An individual, whostores important and sensitive data on a home computer system, may wantto load certain sensitive files to prevent hackers from extracting suchfiles. An electronic library, cloud datastore, network datastore, andthe like may want to load certain files with padding. Any entityconcerned with preventing efficient transfer of important information intheir custody can advantageously employ file loading to inhibit filetransfer, copying, downloading, and the like, without a set degree ofneeded resources, effort, and time.

In another aspect, the application 304 may encapsulate an original datafile and a pad data file into a new data storage file, i.e., a resultingloaded file. In one configuration, multiple data files may be combinedwith one or more pad data files to form a new data storage file.

File Formats

A file format is a particular way that information is encoded forstorage in a computer file. Since a disk drive, or indeed any computerstorage, can store only bits, the computer must have some way ofconverting information to 0s and 1s and vice-versa. There are differentkinds of formats for different kinds of information. Within any formattype, e.g., word processor documents, there will typically be severaldifferent formats. Sometimes these formats compete with each other. Fileformats are divided into proprietary and open formats.

Generality

Some file formats are designed for particular types of data: PNG files,for example, store bitmapped images using lossless data compression.Other file formats, however, are designed for storage of severaldifferent types of data: the Ogg format can act as a container for manydifferent types of multimedia, including any combination of audio and/orvideo, with or without text (such as subtitles), and metadata. A textfile can contain any stream of characters, encoded for example as ASCIIor Unicode, including possible control characters. Some file formats,such as HTML, Scalable Vector Graphics and the source code of computersoftware, are also text files with defined syntaxes that allow them tobe used for specific purposes.

Many file formats, including some of the most well-known file formats,have a published specification document (often with a referenceimplementation) that describes exactly how the data is to be encoded,and which can be used to determine whether or not a particular programtreats a particular file format correctly. There are, however, tworeasons why this is not always the case. First, some file formatdevelopers view their specification documents as trade secrets, andtherefore do not release them to the public. Second, some file formatdevelopers never spend time writing a separate specification document;rather, the format is defined only implicitly, through the program(s)that manipulate data in the format.

Most modern operating systems, and individual applications, need to useall of these approaches to process various files, at least to be able toread ‘foreign’ file formats, if not work with them completely.

Filename Extension

One way of identifying file formats in use by several operating systems,including Windows, Mac OS X, CP/M, DOS, VMS, and VM/CMS, is to determinethe format of a file based on the section of its name following thefinal period. This portion of the filename is known as the filenameextension. For example, HTML documents are identified by names that endwith .htm (or .html), and GIF images by .gif. In the original FATfilesystem, filenames were limited to an eight-character identifier anda three-character extension, which is known as 8.3 filename. Manyformats thus still use three-character extensions, even though modernoperating systems and application programs no longer have thislimitation. Since there is no standard list of extensions, more than oneformat can use the same extension, which can confuse the operatingsystem and consequently users.

Internal Metadata

Another way to identify a file format is to store information regardingthe format inside the file itself. Usually, such information is writtenin one (or more) binary string(s), tagged or raw texts placed in fixed,specific locations within the file. Since the easiest place to locatethem is at the beginning of it, such area is usually called a fileheader when it is greater than a few bytes, or a magic number if it isjust a few bytes long.

File Header

Meta-data contained in a file header is not necessarily stored only atthe beginning of the file, but might be present in other areas too,often including the end of the file; that depends on the file format orthe type of data it contains. Character-based (text) files havecharacter-based human-readable headers, whereas binary formats usuallyfeature binary headers, although that is not a rule: a human-readablefile header may require more bytes, but is easily discernable withsimple text or hexadecimal editors. File headers may not only containthe information required by algorithms to identify the file formatalone, but also real metadata about the file and its contents. Forexample most image file formats store information about image size,resolution, colour space/format and optionally other authoringinformation like who, when and where it was made, what camera model andshooting parameters was it taken with (if any, cfr. Exif), and so on.Such metadata may be used by a program reading or interpreting the fileboth during the loading process and after that, but can also be used bythe operating system to quickly capture information about the fileitself without loading it all into memory.

Magic Number

One way to incorporate such metadata, often associated with Unix and itsderivatives, is just to store a “magic number” inside the file itself.Originally, this term was used for a specific set of 2-byte identifiersat the beginning of a file, but since any undecoded binary sequence canbe regarded as a number, any feature of a file format which uniquelydistinguishes it can be used for identification. GIF images, forinstance, always begin with the ASCII representation of either GIF87a orGIF89a, depending upon the standard to which they adhere. Many filetypes, most especially plain-text files, are harder to spot by thismethod. HTML files, for example, might begin with the string <html>(which is not case sensitive), or an appropriate document typedefinition that starts with <!DOCTYPE, or, for XHTML, the XMLidentifier, which begins with <?xml. The files can also begin with HTMLcomments, random text, or several empty lines, but still be usable HTML.

The magic number approach offers better assurances that the format willbe identified correctly, and can often determine more preciseinformation about the file. Since reasonably reliable “magic number”tests can be fairly complex, and each file must effectively be testedagainst every possibility in the magic database, this approach isrelatively inefficient, especially for displaying large lists of files(in contrast, filename and metadata-based methods need check only onepiece of data, and match it against a sorted index). Also, data must beread from the file itself, increasing latency as opposed to metadatastored in the directory. Where filetypes don't lend themselves torecognition in this way, the system must fall back to metadata. It is,however, the best way for a program to check if a file it has been toldto process is of the correct format: while the file's name or metadatamay be altered independently of its content, failing a well-designedmagic number test is a pretty sure sign that the file is either corruptor of the wrong type. On the other hand a valid magic number does notguarantee that the file is not corrupt or of a wrong type.

So-called shebang lines in script files are a special case of magicnumbers. Here, the magic number is human-readable text that identifies aspecific command interpreter and options to be passed to the commandinterpreter.

Another operating system using magic numbers is AmigaOS, where magicnumbers were called “Magic Cookies” and were adopted as a standardsystem to recognize executables in Hunk executable file format and alsoto let single programs, tools and utilities deal automatically withtheir saved data files, or any other kind of file types when saving andloading data. This system was then enhanced with the Amiga standardDatatype recognition system. Another method was the FourCC method,originating in OSType on Macintosh, later adapted by Interchange FileFormat (IFF) and derivatives.

External Metadata

A final way of storing the format of a file is to explicitly storeinformation about the format in the file system, rather than within thefile itself. This approach keeps the metadata separate from both themain data and the name, but is also less portable than either fileextensions or “magic numbers”, since the format has to be converted fromfilesystem to filesystem. While this is also true to an extent withfilename extensions—for instance, for compatibility with MS-DOS's threecharacter limit—most forms of storage have a roughly equivalentdefinition of a file's data and name, but may have varying or norepresentation of further metadata.

Zip files or archive files solve the problem of handling metadata. Autility program collects multiple files together along with metadataabout each file and the folders/directories they came from all withinone new file (e.g. a zip file with extension .zip). The new file is alsocompressed and possibly encrypted, but now is transmissible as a singlefile across operating systems by FTP systems or attached to email. Atthe destination, it must be unzipped by a compatible utility to beuseful, but the problems of transmission are solved this way.

Mac OS Type-Codes

The Mac OS' Hierarchical File System stores codes for creator and typeas part of the directory entry for each file. These codes are referredto as OSTypes, and for instance a HyperCard “stack” file has a creatorof WILD (from Hypercard's previous name, “WildCard”) and a type of STAK.The type code specifies the format of the file, while the creator codespecifies the default program to open it with when double-clicked by theuser. For example, the user could have several text files all with thetype code of TEXT, but which each open in a different program, due tohaving differing creator codes. RISC OS uses a similar system,consisting of a 12-bit number which can be looked up in a table ofdescriptions—e.g. the hexadecimal number FF5 is “aliased” to PoScript,representing a PostScript file.

Mac OS X Uniform Type Identifiers (UTIs)

A Uniform Type Identifier (UTI) is a method used in Mac OS X foruniquely identifying “typed” classes of entity, such as file formats. Itwas developed by Apple as a replacement for OSType (type & creatorcodes). The UTI is a Core Foundation string, which uses a reverse-DNSstring. Common or standard types use the public domain (e.g. public.pngfor a Portable Network Graphics image), while other domains can be usedfor third-party types (e.g. com.adobe.pdf for Portable Document Format).UTIs can be defined within a hierarchical structure, known as aconformance hierarchy. Thus, public.png conforms to a supertype ofpublic.image, which itself conforms to a supertype of public.data. A UTIcan exist in multiple hierarchies, which provides great flexibility.

In addition to file formats, UTIs can also be used for other entitieswhich can exist in OS X, including:

-   -   Pasteboard data    -   Folders (directories)    -   Translatable types (as handled by the Translation Manager)    -   Bundles    -   Frameworks    -   Streaming data    -   Aliases and symlinks

OS/2 Extended Attributes

The HPFS, FAT12 and FAT16 (but not FAT32) filesystems allow the storageof “extended attributes” with files. These comprise an arbitrary set oftriplets with a name, a coded type for the value and a value, where thenames are unique and values can be up to 64 KB long. There arestandardized meanings for certain types and names (under OS/2). One suchis that the “.TYPE” extended attribute is used to determine the filetype. Its value comprises a list of one or more file types associatedwith the file, each of which is a string, such as “Plain Text” or “HTMLdocument”. Thus a file may have several types.

The NTFS file system also allows to store OS/2 extended attributes, asone of file forks, but this feature is merely present to support theOS/2 subsystem (not present in XP), so the Win32 subsystem treats thisinformation as an opaque block of data and does not use it. Instead, itrelies on other file forks to store meta-information in Win32-specificformats. OS/2 extended attributes can still be read and written by Win32programs, but the data must be entirely parsed by applications.

POSIX Extended Attributes

On Unix and Unix-like systems, the ext2, ext3, ReiserFS version 3, XFS,IFS, FFS, and HFS+ filesystems allow the storage of extended attributeswith files. These include an arbitrary list of “name=value” strings,where the names are unique and a value can be accessed through itsrelated name.

PRONOM Unique Identifiers (PUIDs)

The PRONOM Persistent Unique Identifier (PUID) is an extensible schemeof persistent, unique and unambiguous identifiers for file formats,which has been developed by The National Archives of the UK as part ofits PRONOM technical registry service. PUIDs can be expressed as UniformResource Identifiers using the info:pronom/namespace. Although not yetwidely used outside of UK government and some digital preservationprogrammes, the PUID scheme does provide greater granularity than mostalternative schemes.

MIME Types

MIME types are widely used in many Internet-related applications, andincreasingly elsewhere, although their usage for on-disc typeinformation is rare. These consist of a standardised system ofidentifiers (managed by IANA) consisting of a type and a sub-type,separated by a slash—for instance, text/html or image/gif. These wereoriginally intended as a way of identifying what type of file wasattached to an e-mail, independent of the source and target operatingsystems. MIME types identify files on BeOS, AmigaOS 4.0 and MorphOS, aswell as store unique application signatures for application launching.In AmigaOS and MorphOS the Mime type system works in parallel with Amigaspecific Datatype system.

File Format Identifiers (FFIDs)

File format identifiers is another, not widely used way to identify fileformats according to their origin and their file category. It wascreated for the Description Explorer suite of software. It is composedof several digits of the form XX-YYYYYYY. The first part indicates theorganisation origin/maintainer (this number represents a value in acompany/standards organisation database), the 2 following digitscategorize the type of file in hexadecimal. The final part is composedof the usual file extension of the file or the international standardnumber of the file, padded left with zeros. For example, the PNG filespecification has the FFID of 000000001-31-0015948 where 31 indicates animage file, 0015948 is the standard number and 000000001 indicates theISO Organisation.

File Content Based Format Identification

Another way to identify the file format is to look at the file contentsfor distinguishable patterns among file types. As we know, the filecontents are sequence of bytes and a byte has 256 unique patterns(0˜255). Thus, counting the occurrence of byte patterns that is oftenreferred as byte frequency distribution gives distinguishable patternsto identify file types. There are many content based file typeidentification schemes that use byte frequency distribution to build therepresentative models for file type and use any statistical and datamining techniques to identify file types.

File Structure

There are several types of ways to structure data in a file.

Unstructured Formats (Raw Memory Dumps)

Earlier file formats used raw data formats that consisted of directlydumping the memory images of one or more structures into the file.Developing tools for reading and writing these types of files is verysimple. The limitations of the unstructured formats led to thedevelopment of other types of file formats that could be easily extendedand be backward compatible at the same time.

Chunk-Based Formats

Electronic Arts and Commodore-Amiga pioneered this file format in 1985,with their IFF (Interchange File Format) file format. In this kind offile structure, each piece of data is embedded in a container thatcontains a signature identifying the data, as well the length of thedata (for binary encoded files). This type of container is called a“chunk”. The signature is usually called a chunk id, chunk identifier,or tag identifier. With this type of file structure, tools that do notknow certain chunk identifiers simply skip those that they do notunderstand.

This concept has been adopted by RIFF (Microsoft-IBM equivalent of IFF),PNG, JPEG storage, DER (Distinguished Encoding Rules) encoded streamsand files (which were originally described in CCITT X.409:1984 andtherefore predate IFF), and Structured Data Exchange Format (SDXF). EvenXML can be considered a kind of chunk based format, since each dataelement is surrounded by tags which are akin to chunk identifiers.

Directory-Based Formats

This is another extensible format, that closely resembles a file system(OLE Documents are actual filesystems), where the file is composed of‘directory entries’ that contain the location of the data within thefile itself as well as its signatures (and in certain cases its type).Good examples of these types of file structures are disk images, OLEdocuments and TIFF images

Document File Format

A document file format is a text or binary file format for storingdocuments on a storage media, especially for use by computers. Therecurrently exist a multitude of incompatible document file formats.

It appears that XML is to be the basis for future document file formats.Examples of XML-based open standards are DocBook, XHTML and, morerecently, the ISO/IEC standards OpenDocument (ISO 26300:2006), OfficeOpen XML (ISO 29500:2008).

In 1993 the ITU-T tried to establish a standard for document fileformats, known as the Open Document Architecture (ODA) which wassupposed to replace all competing document file formats. It is describedin ITU-T documents T.411 through T.421, which are equivalent to ISO8613. It did not succeed.

Page description languages such as PostScript and PDF have become the defacto-standard for documents that a typical user should only be able tocreate and read, not edit. In 2001 the PDF format has become also theinternational ISO/IEC standard (ISO 15930-1:2001, ISO 19005-1:2005, ISO32000-1:2008).

HTML is the most used and open international standard and it is alsoused as document file format. It has become also ISO/IEC standard (ISO15445:2000).

The default binary file format used by Microsoft Word (.doc) has becomewidespread de facto-standard for office documents, but it is aproprietary format and is not always fully supported by other wordprocessors.

In certain aspects, a file loading system comprises: a datastore forstoring a plurality of data files where each of the plurality of datafiles includes information, and a processor arranged to: access theplurality of data files in the datastore, and load a data pad into oneor more of the plurality of data files to increase the size of the oneor more of the plurality of data files. The processor may change thefile format of at least one of the plurality of data files from a firstfile format to a second file format

In one configuration, the processor loads the data pad in response to anoperation associated with the one or more of the plurality of datafiles. The operation may include at least one of copy, move, transfer,cut, paste, attach, delete, and send. The processor may automaticallyload the data pad in response to the operation. The size of the one ormore data files may be determined based on at least one of thesensitivity of one or more of data files, the data transfer rate of anetwork in which the datastore resides, the data capacity of one or moreremovable media storage devices capable of interfacing with thedatastore, the data capacity of the datastore, and the processing powerof the processor. The sensitivity may be based on at least one of thevalue of the information, the need for secrecy of the information, andthe need for privacy of the information.

The processor may interleave portions of the information in each datafile throughout the data pad in each of the one or more of the pluralityof data files. The interleaved portions of information of the each datafile may be interleaved base on an interleaving key. The data pad mayinclude a pattern of data elements. The pattern of data elements may bebased on a pseudorandom function. The pattern of data elements may bebased on a pattern key.

A monitor may be arranged to monitor an operation associated with theone or more of the plurality of data files. The monitoring may includeinspecting the data pad within the one or more of the plurality of datafiles. The monitor may store at least one key related to the one or moreof the plurality of data files. The monitor may use the at least one keyto confirm that the pattern of the data pad within the one or more ofthe plurality of data files is correct. The monitor may identify the oneor more of the plurality of data files based on the data pad. Themonitor may monitor the processor to determine whether at least one of afile stripping application is running, a file fragmentation applicationis running, and a file transfer application is running. The size of theone or more data files may be set such that the size does not exceed themaximum size limit associated with the format of the one or more datafiles.

In another aspect, a method for file loading comprises: storing aplurality of data files where each of the plurality of data filesincludes information, accessing the plurality of data files in thedatastore, and loading a data pad into one or more of the plurality ofdata files to increase the size of the one or more of the plurality ofdata files.

In another aspect, a method for inhibiting the transfer of datacomprises: storing a plurality of data files in a host datastore whereeach of the plurality of data files includes sensitive information,estimating a data size needed for a portion of the plurality of datafiles to inhibit their transfer from the datastore, and loading each ofthe plurality of data files with padding such that the data size of theportion of the plurality of data files is greater than or equal to theestimated data size.

It will be apparent to those of ordinary skill in the art that certainaspects involved in the operation of the controller 102 may be embodiedin a computer program product that includes a computer usable and/orreadable medium. For example, such a computer usable medium may consistof a read only memory device, such as a CD ROM disk or conventional ROMdevices, or a random access memory, such as a hard drive device or acomputer diskette, or flash memory device having a computer readableprogram code stored thereon.

Those skilled in the art will know or be able to ascertain using no morethan routine experimentation, many equivalents to the embodiments andpractices described herein. Accordingly, it will be understood that theinvention is not to be limited to the embodiments disclosed herein, butis to be understood from the following claims, which are to beinterpreted as broadly as allowed under the law.

What is claimed is:
 1. A file loading system comprising: a datastoreconfigured to store a plurality of data files, the plurality of datafiles including a plurality of original data files and at least oneloaded data file; a removable media storage device capable ofinterfacing with the datatstore; and a processor arranged to: access theplurality of data files in the datastore, convert a first original datafile of the plurality of original data files into a first loaded datafile by adding a first data pad to original data of the first originaldata file, wherein the first data pad includes a first pattern of dataelements, monitor operations associated with the plurality of data filesto detect a first data transfer operation from the datastore to theremovable media storage device, and identify a data file associated withthe first data transfer operation as the first loaded data file bydetecting the first pattern of data elements within the data fileassociated with the first data transfer operation.
 2. The system ofclaim 1, wherein identifying the data file is in response to detectingthe first data transfer operation.
 3. The system of claim 1, wherein thefirst data transfer operation includes one of a copy, move, download,and transfer function.
 4. The system of claim 1, wherein the processoris configured to block the transfer of the first loaded data file to theremovable media storage device.
 5. The system of claim 1, wherein anamount of added data is further determined based on at least one of thesensitivity of one or more of data files, the data transfer rate of anetwork in which the datastore resides, the data storage capacity of theremovable media storage device, the data capacity of the datastore, andthe processing power of the first computer.
 6. The system of claim 1,wherein the removable media storage device is concealably transportableby a human.
 7. The system of claim 1, wherein converting is based on thesensitivity of the first original data file, wherein the sensitivity isbased on at least one of the value of the information, the need forsecrecy of the information, and the need for privacy of the information.8. The system of claim 1, wherein the processor converts the firstoriginal data file in response to an operation associated with the oneor more of the plurality of data files.
 9. The system of claim 1,wherein the processor interleaves portions of the original data withportions of the data pad in the first loaded data file.
 10. The systemof claim 1, wherein the first pattern of data elements includes a knownpattern of data elements.
 11. The system of claim 1, wherein the firstpattern of data elements includes a derivable pattern of data elements.12. The system of claim 1, wherein the first pattern of data elements isunique to the first loaded data file.
 13. The system of claim 1, whereinthe first pattern of data elements is based on a pseudorandom function.14. The system of claim 13, wherein the first pattern of data elementsis based on a pattern key.
 15. The system of claim 1, wherein theprocessor stores at least one key related to the one or more of theplurality of data files.
 16. The system of claim 15, wherein theprocessor uses the at least one key to confirm that the pattern of thedata pad within the one or more of the plurality of data files iscorrect.
 17. The system of claim 1, wherein the processor monitors thesystem to determine whether at least one of a file stripping applicationis running, a file fragmentation application is running, and a filetransfer application is running.
 18. A method for file loadingcomprising: storing a plurality of data files in a datastore, theplurality of data files including a plurality of original data files andat least one loaded data file; interfacing the datastore with aremovable media storage device; accessing the plurality of data files inthe datastore; converting a first original data file of the plurality oforiginal data files into a first loaded data file by adding a first datapad to original data of the first original data file, wherein the firstdata pad includes a first pattern of data elements; monitoringoperations associated with the plurality of data files to detect a firstdata transfer operation from the datastore to the removable mediastorage device; and identifying a data file associated with the firstdata transfer operation as the first loaded data file by detecting thefirst pattern of data elements within the data file associated with thefirst data transfer operation.
 19. The method of claim 18 comprisingidentifying the data file in response to detecting the first datatransfer operation.
 20. A non-transient computer readable mediumcontaining program instructions for causing a computer to perform themethod of: storing a plurality of data files in a datastore, theplurality of data files including a plurality of original data files andat least one loaded data file; interfacing the datastore with aremovable media storage device; accessing the plurality of data files inthe datastore; converting a first original data file of the plurality oforiginal data files into a first loaded data file by adding a first datapad to original data of the first original data file, wherein the firstdata pad includes a first pattern of data elements; monitoringoperations associated with the plurality of data files to detect a firstdata transfer operation from the datastore to the removable mediastorage device, and identifying a data file associated with the firstdata transfer operation as the first loaded data file by detecting thefirst pattern of data elements within the data file associated with thefirst data transfer operation.